Reactive Rebalancing for Scientific Simulations running on ExaScale High Performance Computers
نویسندگان
چکیده
Exascale computers, the next generation of high performance computers, are expected to process 1 exaflops around 2018. However the processor cores used in these systems are very likely to suffer from unpredictable high variability in performance. We built a prototype generalpurpose reactive work rebalancer that handles such performance variability with low overhead. We did an experimental validation by developing a reactive rebalancer library in UPC, and using it in a 5-point stencil (heat) simulation. The experiments show that our approach has very limited overhead that compensates for runtime processor speed variations, with or without simulated processor slowdowns.
منابع مشابه
Exploring reliability of exascale systems through simulations
Exascale computers are predicted to emerge by the end of this decade with millions of nodes and billions of concurrent cores/threads. One of the most critical challenges for exascale computing is how to effectively and efficiently maintain the system reliability. Checkpointing is the state-of-theart technique for high-end computing system reliability that has proved to work well for current pet...
متن کاملLNCS 7851 - High Performance Computing for Computational Science - VECPAR 2012
The development of an exascale computing capability with machines capable of executing O(10) operations per second by the end of the decade will be characterized by significant and dramatic changes in computing hardware architecture from current (2012) petascale high-performance computers. From the perspective of computational science, this will be at least as disruptive as the transition from ...
متن کاملModeling and Simulation of Dynamic Applications for Exascale Computing Platforms
• Basics of experiment analysis with R is a plus 1 Context There is a continued need for higher compute performance: scientific grand challenges, engineering, geo-physics, bioinformatics, etc. Such studies used to be carried out on large ad hoc supercomputers, which, for economical reasons, were replaced by commodity clusters, i.e., sets of off-the-shelf computers interconnected by fast switche...
متن کاملExploring Energy Behaviors of I/O Management Approaches for Exascale Systems
The advent of fast, unprecedentedly scalable, yet energy-hungry exascale supercomputers poses a major challenge consisting in sustaining a high performance per watt ratio. While much recent work has explored new approaches to I/O management, aiming to reduce the I/O performance bottleneck exhibited by HPC applications (and hence to improve application performance), there is comparatively little...
متن کاملScalable and Highly Available Fault Resilient Programming Middleware for Exascale Computing
A hierarchical master-worker model is believed to be a promising programming paradigm that can achieve weak scaling on exascale-level high performance computers [1]. However, “fault resiliency” is one of the most important issues for exascale computing because the Mean Time Between Failure (MTBF) of such computers will be short [2]. We propose a fault resilient programming middleware called Fal...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011